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Abstract 

With the advent of structured data in the form of social networks, genetic circuits 
and protein interaction networks, statistical analysis of networks has gained popularity 
over recent years. Stochastic block model constitutes a classical cluster-exhibiting ran¬ 
dom graph model for networks. There is a substantial amount of literature devoted to 
proposing strategies for estimating and inferring parameters of the model, both from 
classical and Bayesian viewpoints. Unlike the classical counterpart, there is however 
a dearth of theoretical results on the accuracy of estimation in the Bayesian setting. 

In this article, we undertake a theoretical investigation of the posterior distribution of 
the parameters in a stochastic block model. In particular, we show that one obtains 
optimal rates of posterior convergence with routinely used multinomial-Dirichlet pri¬ 
ors on cluster indicators and uniform priors on the probabilities of the random edge 
indicators. En route, we develop geometric embedding techniques to exploit the lower 
dimensional structure of the parameter space which may be of independent interest. 

Keywords: Bayesian; block models; clustering; multinomial-Dirichlet; network; posterior 
contraction; random graph 


1 


1 Introduction 


Data available in the form of networks are increasingly becoming common in applications 
ranging from brain connectivity, protein interactions, web applications and social networks 
to name a fe w, motivating an explosi on of activity in the statistical analysis of networks in 


recent years 


Goldenberg et al. 


(|2010l b Estimating large networks offers unique challenges 


in terms of structured dimension reduction and estimation in stylized domains, necessi¬ 
tating new tools for inference. A rich variety of probabilistic models ha ve been studied 
for ne twork estimation, ranging from the cl assical Erdos and Ren y i grap hs 


(jl96lh . exponential random graph models 


Holland &: Leinhardt 


models 


dolland et al 


models (Hoff et al 


(Il983l b markov graphs (jErank fc Strauss 


Erdos fc Renvil 


198l|), stochastic block 
1986l i and latent space 


2 OO 2 I I to name a few. 


In a network with n nodes, there are 0{'n?) possible connections betweens pairs of 
nodes, the exact number depending on whether the network is directed/undirected and 
whether self-loops are permitted. A common goal of the parametric models mentioned 
previously is to parsimoniously represent the O(n^) probabilities of connections between 
pairs of nodes in terms of fewer parameters. The stochastic block model achieves this by 
clustering the nodes into A: <C n groups, with the probability of an edge between two nodes 


solely dependent on their c’ 
matical soci ology literature 


uster member s hips. The block model originated in the mathe- 


Holland et al 


in statistics 


Wang fc Wond (jl987l l: 


(1198311. with s ubseq u ent widespread app 


Sniiders fc Nowickil (jl997l l: 


Nowicki &: Sniiden 


ications 


(I 2 OO 1 I). 


In particular, the clustering property of block models offers a natural way to find commu 


(2009 

1; 

Newman 

2012) 

Amini et al. 

(201 

' > 

1 - 


Zhao et al 


( 2012ll : iKarrer &: NewmanI ( 201 il l: 


Bickel fc Chen 


Zhao et al 


(I2OI1I); 


(|2013l l. Various modifications of the stochastic blo ck model have also been 


proposed, including the mixed members hip stochastic block model 


Airoldi et al 


degree-corrected stochastic block model 


Zhao et al 


(|2nn9l l and 


( 120121 1 . 


Statistical accuracy of parameter inference in the stochastic block model is of growing 
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interest, with one of the objects of interest being the n x n matrix of probabilities of 
edges between pairs of nodes, whic h we s hall denote hy 6 = {9ij). Using a singular-value 


thresholding approach, 


Chatteried (201J) obtained a -sfkjn rate for estimating 6 with 


respect to the s quared distanc e in a fc-component stochastic block model. In a recent 


technical report, 


Gao et al 


( 2014l l obtained an improved /v?+\og k/n rate by considering 


a least-squares type estimator. They also showed that the resulting rate is minimax - 
optimal; interestingly the minimax rate comprises of two parts which 


Gao et al. 


(120141 1 


refer to as t 


Bickel et al 


r e nonp arametric and clustering rates respectively. Among other related work, 
(|2013l l provided conditions for asymptotic normality of maximum likelihood 


estimators in stochastic block models. 

In this article, we consider a Bayesian formulation of a stochastic block model where 9 is 
equipped with a hierarchical prior and study the convergence of the posterior distribution 
assuming the data to be generated from a stochastic block model. We show that one obtains 
the minimax rate of posterior convergence with essentially automatic prior choices, such 
as multinomial-Dirichlet priors on cluster indicators and uniform priors on the probability 


literature 


(12003); 


Sniiders &: Nowicki 

(1997 

); 

Nowicki &: Sniiders 

0 

0 

CNl 

); 

Goliehtlv & Wilkinson 


McDaid et al 


2 OI 3 I I on posterior sampling and inference in the stochastic block 


model. However, to the best of our knowledge, the present paper is the first to study the 
asymptotic properties of Bayesian estimation in stochastic block models. 

Theoretical investigation of the posterior distribution in block models offers some 
unique challenges relative to the sm all but growing literature on p os terior conv e rgenc e 


in high-dimensiona l 


Baneriee fc Ghosall ( 2014l l 


spars e 


problems 


Gastillo &: van der Vaarti (2012); 


Gastillo et al 


Pati et al 


(120141 1: 


(|2015l l. When a large subset of the parameters 


are exactly or approximately zero, the sparsity assumption can be exploited to reduce the 
complexity of the model space to derive tests for the true parameter versus t he comple- 


ment of a neighborhood of the true parameter 


Gastillo &: van der Vaarti (2012); 


Pati et al. 
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(j201J). It is now well appreciate c 


terior asymptotics 


Ghosaletal 


that constructing s u ch tes ts play a crucial role in pos- 


(2000); ICine &: Nickll ( 201 il l. In the present setting, we 


exploit the parsimonious structure of the parameters space as a result of clustering of n 
nodes into k < n communities and develop geometric embedding techniques to derive such 
tests. 


2 Preliminaries 


For 5 C M, we shall denote the set of all d x d matrices with entries in S by For 

any B = {Bn/) G we denote the Euclidean (equivalently Frobenius) norm of B by 

ll^ll = V^Ef=i Given X* G G let Cd^{X*-,W) denote the unit 

ellipsoid with center X* and weight W given by 

d d 


CAX*;W) = \xg 


X)dxd 




( 1 ) 


1=1 i'=i 

Viewed as a subset of the Euclidean volume of ^f]^ 2 {X*; IF), denoted by \^f]^ 2 {X*; IF)|, 

is 

^2 d d 


|e,2(X*;IF)| = 


r(d=/2 + i)Iin*»'«' 


- 1/2 


( 2 ) 


Given sequences {a„}, {bn}, an < bn indicates there exists a constant K > 0 such that 
an < Kbn for all large re. We say an bn when a„ < bn and bn ^ On- Throughout, C, C 
denote positive constants whose values might change from one line to the next. 


3 Stochastic Block models 

Let A = (Aij) G {0,1}”^” denote the adjacency matrix of a network with re nodes, with 
Aij = 1 indicating the presence of an edge from node i to node j and Aij = 0 indicating 
a lack thereof. To keep the subsequent notation clean, we shall consider directed networks 
with self-loops so that Aij and Aji need not be the same and An can be both 0 and 1. 
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Our theoretical results can be trivially modified to undirected networks with or without 
self-loops. 

Let Oij denote the probability of an edge from node i to j, with Aij ~ Bernoulli(0^) 
independently for 1 < < n. A stochastic block model postulates that the nodes 

are clustered into commnnities, with the probability of an edge between two nodes solely 
dependent on their community memberships. Specifically, let Zi G {l,...,/c} denote the 
cluster membership of the ith node and Q = {Qrs) £ [0,1]^^^ be a matrix of probabilities, 
with Qrs indicating the probability of an edge from any node i in cluster r to any node j 
in cluster s. With these notations, a /c-component stochastic block model is given by 

Aij ~ Bernoulli(6'y), % = Qzizy (3) 

We use Eg/Pe to denote an expectation/probability under the sampling mechanism (j3j). 

The stochastic block model clearly imposes a parsimonious structure on the node prob¬ 
abilities 0 = (Oij) when k n, redncing the effective nnmber of parameters from O(n^) to 
0{k‘^ + n). To describe the parameter space for 9, we need to introduce some notations. For 
k < n, let Zn^k = ■ ■ ■) Zn) : Zi £ {I,, k}, 1 < i < n} denote all possible clusterings 

of n nodes into k clusters. Elements of Zn^k with be denoted hy z = {zi,..., Zn). For any 
1 < r < /c, z~^(r) is nsed as a shorthand for {1 < i < n : Zj = r}; the nodes belonging 
to cluster r. When z is clear from the context, we shall use = \z~^{r)\ to denote the 
number of nodes in cluster r; clearly With these notations, the parameter 

space 0fc for 6 is given by 

0fc = {0 G [0,1]"^" : Oij = Q,,,., z G Q G [0, (4) 

For any z G Zn^k and Q G [0,we denote the corresponding 9 £ by 0^’®, so that 
= QziZj- In fact, {z, Q) i—)• 9^’^ is a surjective map from Zn^k x [0,1]^^^ —>■ 0^, though 
it is clearly not injective. 

Given z G Zn^k, let A\^rs] denote the x snb matrix of A consisting of entries Aij 
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with Zi = r and Zj = s. The joint likelihood of A under model ([3]) can be expressed as 

k k 

P{A\z,Q) = Y[1[p{A[,,]\z,Q), P{A[,,]\z,Q)= Y[ l[ ( 5 ) 

r=l i:zi—r j'.Zj—s 

A Bayesian specification of the stochastic block model can be completed by assigning 
independent priors to z and Q, which in turn induces a prior on 0^ via the mapping 
(z, Q) I—)■ 9^’^. We generically use p{z, Q) = p{z)p{Q) to denote the joint prior on z and Q. 
The induced prior on 0^ will be denoted by n(0) and the corresponding posterior given 
data A = {Aij) will be denoted by n„(0 | A). The following fact is useful and heavily used 
in the sequel: for any U C Qk, 

u{u) = n(f/1 z) p{z) = Y. p(Q- ^ (6) 

where the second equality uses the independence of z and Q. Specific choices of p{z) and 
p{Q) are discussed below. 

We assume independent [7(0,1) prior on Qrs- We consider a hierarchical prior on z 
where each node has probability iTr of being allocated to the rth cluster independently of the 
other nodes, and the vector of probabilities vr = (vri,..., vr^) follows a Dirichlet(ai ,... ,ak) 
prior. Here ai,... ,ak are fixed hyper-parameters that do not depend on /c or n; a default 
choice is = 1/2 for all r = 1,... ,k. Model ([3]) along with the prior specified above can 
be expressed hierarchically as follows: 


[7(0,1), r,s = l,.. 

.,k, 

(7) 

P{zi = k \ tt) = TTk, i = l, 

... ,n. 

(8) 

vr ~ Dirichlet(a;i,..., a^). 


(9) 

Aij Bernoulh( 04 j), 

Gij — QziZj- 

(10) 


A hierarchical specification as in (or ve ry similar to) (171) - (HOI) has been commonly 
used in the literature; see for exam ple, 


(j200l|); IColightlv fc WilkinsonI (j2005lk 


Sniiders & Nowicki 

(1997 

); 

Nowicki Sz Sniiders 


McDaid et al. 


3 ). 


(1201311 . Analytic marginalizations 


6 


















can be carried out d ue to the conjugate nature of the prior, facilitating posterior sampling 

a) 


McDaidetal 


((20131). In particular, using standard multinomial-Dirichlet conjugacy, the 


marginal prior of z can be written as 


p(z) = 


nEr=l 


n 


r(nr. + Or) 


z G Z, 


n.k 1 


( 11 ) 


r(n + Ur) r(«r-) 

where recall that ~ following lemma provides a bound on the 

ratio of prior probabilities p{z)lp{z') which is used subsequently in the proof of our main 
theorem. 


Lemma 3.1. Assume z' G Zn^h with n(, = Y17=i > 1 for all r = 1,... k. Then, 

i: p{z)/p{z') < where C is a positive constant. 

Proof. Fix z € Zn^k- From (fTTIl . p{z)/p{z') = nr=il"(”»’ + ar)/r(n(, + a^). Dehne non¬ 
negative integers (5^ = \ar\,^r = \pr\i P- = = Yf!f=ilr- Recall the following 

facts about the Gamma function: (i) F(x) is decreasing on (0,1) with l/(2x) < r(x) < 1/x, 
(ii) r(x) < 1 for X G [1,2], r(l) = r(2) = 1 and (hi) r(x) is increasing for x > 2. First, 
we claim r(n(. -|- Ur) > C'Tfn'^ -|- 7 ^) for all r for an absolute constant C > 0. To see this, 
recalling that n], > 1 , separately consider the cases (a) n(. > 2 , (b) n(, = 1 and a,. > 1 and 
(c) n], = 1 and < 1. Cases (a) and (b) follow from fact (hi) above with C = 1. For 
case (c), r(n(, + ar) = r(l -I- a^) = Q;r.r(Q;r.) > 1/2 by fact (i) and r(n(, -|- 7 ,.) = r(l) = 1; 
therefore one may choose C" = 1/2 here. Next, we claim that T{nr + ar) < CT{nr + l3r) for 
all r and an absolute constant (7 > 0. To see this, separately consider cases (a) n,. > 1, (b) 
= 0 and ar >1 and (c) = 0 and ar < 1. Cases (a) and (b) once again follow from 

fact (hi) above with (7 = 1. For case (c), r(n,. -|- ar) = r(Q;r.) and r(n,. -|- /3r) = r(l) = 1, 
so one may choose (7 = maxi<r<fc{l/r(ar)}. We thus have 

p{z) _ -r-r T{nr + ar) ^ f ^ \ TT ^(Uj. + f3r) 

R^“Mr« + a.) - y^r« + 7r-) 

ir / n+'y.—k \ 

_ f C \ {n + f 3 . — k)\ Vnj+7i-l,...,n'j.-|-7fe-l/ 

(n + r-ky. ^ 
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where we used Ylr=i = Er=i K = ^ and := n!/(mi!... mfc!)l(mi + ... m^ = 

n) is the multinomial coefficient. Since the multinomial coefficient as a function of mi,..., rrik- 

(mi,... ,mfc_i) (^^ attains its minimnm if m^ = m for some r and m; = 0 for 

I 7^ r, and maximnm if m^ = m/k for all r = 1 ,..., /c, we have from (1121) . 

p{z) ^ / C \ ^ (n + /3. - A:)! (n + 7. - A:)! _ ( (n + /3. - A:)! 

(n + 7. - A;)! [{(n + /3. - A)/A}!]^ “1^7 [{(n + /3. - A)/A}!]^' 

Set m = n + 7. — A. Since m! >c (m/e)™-y/m by Stirling’s bound, m!/{(m/A)!}^ < 

for C > 1 large enongh. The conclusion follows. 

□ 


4 Posterior convergence rates in Stochastic Block Models 


We are interested in concentration properties of the posterior n„(- | A) assuming the true 
data-generating parameter 0^ G 0^. To measure the discrepancy in the estimation of 
€ 0fc, the mean squared error has been used in the frequentist literature. 


- fL IL - 


) 0||2 


i=l j=l 

where 6 is an estimator of 9^. 




(13) 


Chatteriea (j2014l l proposed estimating 6^ using a low 
rank decomposition of the adjacency matrix A followed b y a singular value decomposition 


to obtain a convergence rate of -s/k/n. More recently. 


Gao et al 


(|2014l l considered a 


least squares type approach which can be related to maximum likelihood estimation where 
the Bernoulli likelihood is replaced by a Gaussian likelihood. They obtained a rate of 
jv? -|- log A/n, which they additionally showed to be the minimax rate over 0^, i.e.. 


i„, sup E,.4|19 - 9»|r X T + 1*1. 

Interestingly, the minimax rate has two components, k‘^/n^ and log A/n. 


(14) 


Gao et al 


(120141 1 


refer to the k"^/n?‘ term in the minimax rate as the nonparametric rate, since it arises from 
















the need to estimate unknown elements in Q from observations. The second part, 
log k/n, is termed as the clustering rate, which appears since the clustering configuration 
z is unknown and needs to be estimated from the data. Observe that the clustering rate 
grows logarithmically in k. Parameterizing k = with <5 £ [0,1], the interp lay between the 


two components becomes clearer (refer to equation 2.6 of 


Gao et al 


(j2014l ')i: in particular, 


the clustering rate dominates when k is small and the nonparametric rate dominates when 
k is large. 

To evaluate Bayesian procedures from a frequentist standpoint, one seeks for the min¬ 
imum possible sequence 0 such that the posterior probability assigned to the com¬ 

plement of an Cn-neighborhood (blown up by a constant factor) of 9^ receives vanish- 
ingly small p r obab ilities. The smallest such is called the posterior convergence rate 
Ghosal et al.l ((20001) • There is now a growing body of literature showing that Bayesian 


procedures achieve the frequentist minimax rate of posterior contraction (up to a loga- 
rithmic te r m) in models where the parameter dim e nsion grow s with the sample size; see 


Bontemosl (l201lh: 


(H); 


Gastillo fc van der VaartI (1201 


van der Pas et al 


(l2Q14l); 


Gastillo et al 


3k 


Pati et al. 


(1201 4l i: 


Banerjee fc Ghosal 


( 2OI5I ) for some flavor of the recent litera¬ 


ture. 

We now state the main result of this article where we derive the convergence rate of 
the posterior arising from the hierarchical formulation d?]) - (|10l) . 


Theorem 4.1. Assume A = (Aij) is generated from a k-component stochastic block model 
(13() with the true data-generating parameter 9^ = {9fj) £ 0^, where 0^ is as in (HD- 
Further assume that there exists a small constant 5 £ (0,1) such that 9^j £ (5,1 — <5) for 
all i, j = 1,... ,n. Suppose the hierarchical Bayesian model (0) - (uni) is fitted. Then, with 
= k^ login j k) j vif -\-logk/n and a sufficiently large constant M > 0, 

lim EgoB^ j 4 t||0 - > M^el I bI = 0. (15) 

n->-oo [ J 

Remark 4.1. Since 9^ £ 0^, following the discussion after (0), there exists £ Zn,k cind 
£ [0,1]^^^ such that 9^ = 9^°’^°. The condition of the theorem posits that all entries 
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of lie in (<5, 1 — 5). The assumption 6^ G 0^ also implicitly implies that all the clusters 
have at least one observation, i.e., = r) >1 for all r = 1,... ,k; otherwise 

00 G 0; for some I < k. 

A proof of Theorem 14.11 can be found in Section 5. Theorem 14.11 shows that the pos¬ 
terior contracts at a (near) minimax rate of k'^ \og{n/k)/+ \ogk/n. The nonparametric 
component of the rate is slightly hurt by a logarithmic term; appearance of such an addi¬ 
tional logarithmic term is common in Bayesian nonparametrics. An inspection of the proof 
additionally reveals that the techniques can be trivially extended to undirected graphs with 
or without self-loops and will produce the same rate. 

4.1 Geometry of 0^ 

In this section, we derive a number of auxiliary results aimed at understanding the geometry 
of the parameter space 0^. These results are used to prove Theorem l4.1l and can be possibly 
of independent interest. 

We first state a testing lemma which harnesses the ability of the likelihood to separate 
points in the parameter space. 

Lemma 4.2. Assume 9^ ^ 9^ G Qk o.nd let E = {9 G [0, l]"'^"- : ||0 — 0*^|| < ||0i — 0‘^||/2} 
he an Euclidean ball of radius ||0^ — 0*^||/2 around 9^ inside [0,1]”'^”. Based on Aij 
Bernoulli{9ij) for i,j = l,...,n, consider testing Hq : 9 = 0*^ versus Hi : 9 G E. There 
exists a test function <1 such that 

E0o(4>) < exp{-Gi||0^-0°||^}, supE0 (l-$) < exp{-C2||0^-^1^}, (16) 

e&E 

for constants Ci,C2 >0 independent ofn,9^ and 9^. 

Proof. Define the test function <I> as 
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where l(-) denotes the indicator of a set. We show below that this test has the desired 
error rates (USD. 

We hrst bound the type-I error Eeo(d>). Noting that under Pgo, {Aij — 0^-) are inde¬ 
pendent zero mean random variables with \Ai-i ~ < Ij we use a version of Hoeffding’s 

inequality (refer to Proposition 5.10 of IVershvninI ((20101)) to conclude that, 

^ n n N 

Ego (<h) = Pgo E E('’t -> ll"' - "“If/4 

f i=l 7 = 1 ' 

=exp{ -Ci|| 0 i- 0 °|f} 


< exp < — Cl 


0^- 


01 - 


for a constant Ci > 0 independent of n,0^ and 0^. 

We next bound the type-II error sup5)g£;E0(l — ^>). Fix 0 ^ E. We have, 


E,(l - ch) = P,<| - 0%) < ||0i - 0°||74 

i=i j=i 


EE(^b- - - %) < 11^' - ^1/4 - 

i=i j=i 


(17) 


where we abbreviate {0',0”) = Y17=i X]?=i Bound 


j=i ij %] 

= (0^ - 0°, 0^ - 0°) - (0^ - 0°, 0^ - 0) 


> 


il a0||2 


il a0||2 


-0^1172 = ||0^ -0°||V2, 


where the penultimate step used the Cauchy-Schwartz inequality along with the fact that 
||0 — 0^11 < ||0^ — Substituting in (fT71) and noting that under Pg, {Aij — 0ij) 

are independent zero mean bounded random variables, another application of Hoeffding’s 
inequality yields 

^ n n X 

E 71 - <h) < pJ E E(4 - < -11^' - V4 

i=i j=i j 

< exp I - C2||E_^| = exp { - C2||0i - 0 °||'} 
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for some constant 6*2 > 0 independent of n and 6. Taking a supremum over 9 ^ E yields 
the desired result. □ 

Our next result is concerned with the structure of a specihc type of Euclidean balls 
inside 0^. Recall that 9^'^ denotes the element of 0^ with 9^^^ = QziZj- For z E let 

ek{z) = {9^’^ ■.Qe[0,lf^^} (18) 

denote a slice of 0^ along z. In other words, given z, Qk{z) is the image of the map 
Q I-)- in 0fc. Suppose 9* = 9^*’^* E 0fc, and consider a ball B{z) in 0^(2) centered 
at 9* of the form B{z) = {0 E 0fc(^) : ||0 — 0*|| < t} for some t > 0. If z* = z, then it is 
straightforward to observe that 

k k 

pz,Q _ qz*,q* f = n,n,(Q„ - (19) 

r=l s=l 

where recall that = Yl'i=i '^{^i — ^) ^ = 1,..., fc. Therefore, although a subset of 

[0, B{z) can be identified with a fc^-dimensional ellipsoid in [0,When z* 7^ z, 

one no longer has a nice identity as above and the geometry of B{z) is more difficult to 
describe. However, we show below in Lemma [4.31 that B{z) is always contained inside a set 
B{z) in 0fc(z) which can be identified with a /c^-dimensional ellipsoid in [0,1]^^^. Recall 
our convention for describing ellipsoids from ([T|). 

Lemma 4.3. Fix z* E Z^^k^Q* £ [Oj 1]^^^, and let 9* = 9^*’^*. For z E Z^^k t > 0, 
let B{z) = {0 E E>k{z) : ||0 — 0*|| < t}. Set Wrs = UrUsIt^ and W = (Wrs), where 
Ur = r = 1,... ,k. Then, B{z) C B{z), where 

B{z) = : Q E 42(Q*, W) n [0,1]^^^} (20) 

for some Q* E [0,1]^^^ depending on Q*,z* and z. In particular, if z* = z, then Q* = Q* 
and the containment becomes equality, i.e., B{z) = B{z). 
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Remark 4.2. From ([T]), ^^ 2 [Q*,W) in Lemma is the collection of all Q satisfying 
Er=iE’;=inrns{Q rs Qrs? < t'^. The last part of Lemma \4-S\ is consistent with the 
discussion preceding ([ISD. When z* = z, ([ISD implies that B{z) consists of all 6^’^ with 
Q G [0,1]^^^ satisfying Y^r=i ELi rirUsiQrs - Qrsf < ■ 


Proof. We begin by constructing Q*. For 1 < r,r' < A:, let Try = z~^{r) n {z*)~^{r') 
and Ury = |/r,r'|- Clearly {Iry,r' = is a partition of z~^{r) and hence Ur = 

W,r'- With these notations, define 


Q 


* 

rs 


1 


TlfTtg 


k k 

^ ^ ^ ^ f^r,r''^syQr' ,s'' 
r'=l s'=l 


( 21 ) 


Clearly, Q*^ is a weighted average of Q*, with weights proportional to Uj-yngy and 
therefore Q* G [0, For 6^’^ G ©^( 2 :), we have 


y,Q 


-e'‘ 




n n 


{QziZj 



Expanding the squares, the term Yll=i Q‘ziZj — Z)r=i Z)s=i nrUgQrs- The cross prod¬ 
uct term can be simplihed to 


n n k k k k 

^ ^ ^ QziZjQz*~ ^ ^ ^ ^ ^ ^ ^ ^ 'k^ry'klsyQrsQr's' 

i=l j=l r=l s=l r'=l s'=l 

k k 

= ^ ^ ^ ^ n^rigQrsQrs- 

r=l s=l 


( 22 ) 


In view of these identities, we can write 


n n k k 

y~l {QziZj — Qz*z*) ~ yy nrns{Qrs — Qrs) + 

i=l j=l r=l s=l 

n n k k 

E -EE wn.(Q);j2, (23) 

2=1 j=l r=l s=l 


We shall show below that the expression in (1231) is non-negative. This completes the proof, 
since we then have X)r=i Z)s=i — Ors)^ < ||^ < ^2 (g^^ Remark l4^ . 
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Recalling the definition of Q*g from (I2ip . the expression in (1231) can be written as 

n n k k 

E - E 


z=l j=l r=l s=l 

k k r k k 


r=l s=l ^r'=ls'=l 


k k 


^r,r'^s,s'(Qr's') „ „ | E! 'E ^r,r'^s,g'Qr'g' 

I Ltp f L a ' 


r'=l s'=l 


2n 


The non-negativity of this quantity now follows from the Cauchy-Schwartz inequality 
{YlaO-abaf < Zla = Q*r' s' = ^ryns^si] and the fact that 

E k 

When 2* = ^, it is clear that riry = 5rr' and hence Q* = Q*. All subsequent inequalities 
then become equalities and the last part is proved. 


□ 


Corollary 4.4. Inspecting the proof of Lemma \4.3\ the condition Q G [0 ,is only 
used to show that barQ* G [0,1]*^^^. If we let Q to be unrestricted, then the containment 
relation continues to hold as subsets ofM.^^^, i.e., 

<t| C ; Q G ^2(0*, W)|, (24) 

with equality when z* = z. 

Lemma 14.31 and Corollary 14.41 crucially exploit the lower dimensional structure under¬ 
lying the parameter space 0fc and is used subsequently multiple times. First, recall from 
(i6l) that one needs a handle on p{Q : 0^’® G U) to bound the prior probability oiU C &k- 
In particular, if [/ = {||^ — < t}, then p{Q : 9^’^ G U) equals the volume of C/ n Qk{z), 

which can be suitably bounded by the volume of the bounding /c^ dimensional ellipsoid. 
Second, a handle on the size of balls in 0^ facilitates calculating the complexity of the 
model space (in terms of metric entropy) which is pivotal in proving the posterior concen¬ 
tration; in particular, to extend the test function in Lemma 14.21 to construct test functions 
against more complex alternatives in Lemma 14.51 below. Once again, the dimensionality 
reduction is key to preventing the metric entropy from growing too fast. 
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Lemma 4.5. Recall from Theorem \4- 1\ Assume 6^ G &k cind for I > 1, let Ui^n = {^ G 
0fc : Incn < ||0 — 0*^11 < (/ + l)nen}. Based on Aij Bernoulli{9ij) for i,j = 1,... ,n, 
consider testing Hq : 6 = 6^ versus Hi : 6 G Ui^n- There exists a test function such that 

IE0o(^>i,n) < exp{-Cifn‘^el), sup E0(1 - < exp(-C'2/^n^e^), (25) 

for constants Ci,C 2 >0 independent ofn. 

Proof. Since 9^ G 0fc, there exists G Zn^k and G [0,1]^^^ with 9^ = 9^°’^°. For 

z G Zn^k, define Ui^n{z) = Ui^n Fl where Qk{z) is as in (fTSIl . Clearly, 

Ui,n{z) = {r-O : Q G [0, Inen < \\9^’^ - < (/ + l)ne4, (26) 

and Ui^ri C Uz&z„ k^i,n{z). We first use Lemma [12] to construct tests against Ui^niz) for 
fixed z. 0ur desired test is obtained by taking the maximum of all such test functions. 

Fix z G Zn^k- Let Mi^n{z) = {9i^n,h G Ui^niz) ■ h G Ii^niz)} be a maximal fne„/2- 
separated set inside Ui^n{z) for some index set Ii^n{z)', i.e., Mi^n{z) is such that ||0^ “ ^ 

lnen/2 for all 9^ ^ 9‘^ £ Afi^n{z), and no subset of Ui^n{z) containing Afi^n{z) has this 
property. We provide a volume argument to determine an upper bound for |//^„(z)|, the 
cardinality oiMi^n{z). The separation property implies that Euclidean balls of radius Incnj^ 
centered at the points in Mi^n{z) are disjoint. Since B^ := ; Q g — 

^z,n,/i|| < Incn/^^ is contained inside an Euclidean ball of radius Incn/^ centered at 9i^n,h, 
the sets B'^ are disjoint as h varies over Ii^n{z). By the triangle inequality, all B'^s lie 
inside B+ = [9^'^ ; Q G \\9^’^ - 0O|| < (5//4 + l)ne„}, since - 0O|| < - 

^Z,n,/i|| + ||^Z,n,Zi ~ ^*^11 — (^ + ^)n€n + InCn/ 4. 

It should be noted that the sets B^s and B~^ are constructed in a way that Q is not 
restricted to be inside [0,This allows us to invoke Corollary 14.41 to identify B^ 
and B~^ with appropriate ellipsoids in and simplify volume calculations. Eirst, since 
(^i,n,h € &k{z) for each h, it follows from (the equality part of) Corollary 14.41 that B^ = 
{9^’^ : Q G f,k^{Qh,W)} with constructed as in the proof of Lemma [4.31 and Wrs = 
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nrUs/iilneri)'^}- The equality is crucially used below; also note that W does not depend on 
h. Invoking Corollary 14.41 one more time, we obtain C {9^'^ : Q G ,W)}, with 

Wrs = nrns/[{{5l/4: + l)ne„}^]. We conclude that the Euclidean ellipsoids ^k^{Qh,W) are 
disjoint as h varies over Ii^n{z) and all of them are contained in ,W). Comparing 

volumes, 

\iMh,W)\\Ii^n{z)\ < 142(0°,kE)|. 


Using the volume formula in 


and canceling out common terms, we finally have 

fc2 




(27) 


We are now in a position to construct the test. The maximality of implies that 

Afi^niz) is an lnen/2-net of Ui^n{z), i.e., the sets Ei^ri,z,h = {0 ^ [0,1]"-^"^ : ||6» - 9i^ri,h\\ < 
lnen/2} cover Ui^n{z) as h varies. For each 9i^n,h £ ■^i,niz), consider testing Hq : 6 = 9^ 
versus Hi : 9 G Ei^n,z,h using the test function from Lemma 14.21 Lemma 14.21 is applicable 
since > Inen] let ^i^n,z,h denote the corresponding test with type-I and II 

errors bounded above by Define 4>; ,j = maKz^z^^. max/j^/^ ^i,n,z,h- For any 

G G Ui^n, there exists z G Zn,k and h G Ii,n{z) such that 9 G Ei^n,z,h, so that IEe(l — < 


£0(1 — ^l^n,z,h) < e 




Taking supremum over 9 G Ui^n delivers the desired type-II 


error. Further, the type-I error of <I>; „ can be bounded as 


lEeo($;,0< (28) 

zeZ„^k h£li^^(z) 

since \Zn^k\ = and by (1271) . \Ii^n{z)\ < 9^^ for all z. The conclusion follows since 
= k'^ log(n/fc) -|- re log k > + nlogk. 

□ 


5 Proof of Theorem 14.11 

Proof. Let Eq/Fo denote an abbreviation to Ego/P^o. Since 9^ G 0fc, there exists G 
Zn^k and G [0,1]^^^ with 9^ = 9^°’^°. Recall en = k'^logn/n? + logk/n and define 
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?7n = {0 G 0fc : ||0 — 0*^11 > M^n^e^} for some large constant M > 0 to be chosen later. 
Letting fg.^^Aij) = (1 — Oij)^^^ denote the Bernoulli(0jj) likelihood evaluated at Aij, 

the posterior probability assigned to Un can be written as 


UniUn I A) = 


_ •^''n 


(29) 


where Mn and respectively denote the numerator and denominator of the fraction in 
(129]). Let Tn denote the cr-field generated by ^ = {Aij), with Aij independently distributed 
as Bernoulli(0? ); the true data generating distribution. We first claim that there exists a 
set An G where we can bound T>n from belo w with large probability under P n in Lemma 


Ghosal k. van der VaartI (j2007l l and hence 


15.11 A proof can be adapted from Lemma 10 of 
omitted. 

Lemma 5.1. Assume 6^ satisfies the eonditions of Theorem \4. 1\ Then, there exists a set 
An in the a-field Tn with lim„^oo Po('4.n) = 1 such that within An, 

Vn > < n^el). 

In view of Lemma l5.11 it is sufficient to prove that 

lim IEo{n„(C/„ I A)l^c } = 0 . 

For I > M, let Ui^n = {0 G 0^ : < ||0 — < (/ + l)^n^e^} denote an annulus 

in 0fc centered at 6^ with inner and outer Euclidean radii Intn and {I + l)nen respec- 
ti vely. Using a standard testing a rgument (see, for example, the proof of Proposition 5.1 


m 


Castillo &: van der VaartI (j 2012 l B in conjunction with Lemma l5 .11 one arrives at 


l=M 


Eo{n„(C/„ I A)l^c} < ^ <^Eo(^>«,n) + fil,n sup E 6 l(l - 


where 


fil,n 


n(c/z,. 


g-Cn24n(||0-0O|| <n^€l) 


(30) 


(31) 
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and is the test function constructed in Lemma 14.51 for testing Hq : 9 = 9^ versus 
Hi : 9 G Ui^n with error rates as in ([2^ . Recall = Ui^n Ll Qki^) and its equivalent 

representation in ([26]) from the proof of Lemma 031 Since Ui^n ^ from (l 6 |), 


n{Ui^n)< n{Rz,n(^)}= U{Ui,n{z)\z}pi, 

Z^Zn^k ^^Z^n,k 


(32) 


where p(z) is the prior probability (jlll) of z under the Dirichlet-multinomial prior. By 
an application of Lemma 14.31 Ui^n{z) C { 9 ^H ; q g ^^ 2 (( 5 °,VL) n [0,1]*'^^} with Wrs = 
rirUs/iil + l)nen}^,l < r,s < k. Therefore, n{?7;^„(z) | z} is bounded above by the 
probability of the set W) H [0, 1 ]^^^ under the uniform prior on Q, which in turn 

can be bounded above by the Euclidean volume of ^k'^iQ^i Using volume formula ([2]), 


n{Ui,niz) I z} < | 42 (Q°,iu)| = 


TT 


k k 

nn 


{I + l)ne^ 


(33) 


r(A:2/2 + 1) ii J-i UrUs 

^ ' r=l s=l 

Next, consider the term n(||0 — < n^e^) in the denominator of the expression for (5i^n. 

Bound n(||0 — < n^e^) > n(||0 — | z = z^)p{z^) and using Lemma [431 

once again, 


n( || 0 - 0 °|P < n^el \z = z^] = 


^ k k 

p\q-ZT. ^Or'^OsiQrs Qrs) ^ ^ 
^ r=l s=l 


(34) 


The probability in the right hand side of the above display is the volume of the intersection 
of an ellipsoid with [0 ,and therefore we cannot simply replace the probability by the 
volume of the ellipsoid. Instead, we embed an appropriate rectangle inside the intersection 
of the ellipsoid and [0,We claim that 

k k , k k ■. 

n Ui^rs - en/2, Q°S + en/2] C | Q G [0, ^ ^ no,no,(Q,, - < nhH. (35) 

First, based on our assumption that all entries of are bounded away from 0 and 1 
and the fact that ^ 0 , it is immediate that the rectangle is contained in [ 0 ,for 
sufficiently large n. Second, for any Q with \Qrs — Q^sl — ^n/2 all 1 < r, s < k, we have 


k k 


k k 


EE nornosiQrs Qrs) — a ^ ^ ^ ^ ^Or^Os 


n‘^el 


r=l s=l 


r=l 5=1 


4 ’ 
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thereby proving the claim in ()35p . Now we can bound n(||0 — | 2; = z^) from 

below by the volume of the rectangle, which equals . Using this fact along with the 
bounds (f32]l . ([33]l and ([3^ . we have from (f3T]l that 






r(A:2/2 + l) 


E 


p{z) 

p{z^ 


(36) 


Since nor > 1 for all r = invoke Lemma 13.11 to bound YIzgz ^Piz)/p{zo) < 

sinc e \Zn^k\ = k"^- Next, use the well-known fact (see, for example. 


Abramowitz fc Stegun (1964)) that for any a > 0, r(Q; -(- 1) > to obtain 


e-^"'^"A,n<{vrV^(/ + l)}"' 


< {7rV^(^ + 


(37) 


Substituting in (I30p , the expression in (I30p converges to zero for all M larger than a suitable 
constant. 

□ 


6 Discussion 


In this article, we presented a theoretical investigation of posterior contraction in stochastic 
block models. One crucial assumption in our current results is that the true number of 
clusters k is known. An interesting direction is to develop a fully Bayesian approach by 
placing a prior on k and to show that the corresponding procedure yields optimal rates of 


posterior convergence adaptively for all values of A: G jl , 2,... , n\. Sue 


be connected to nonparametric estimation of networks 


1 an ap proach can 


Bickel fc Chen 


(120091 1 where one 


typically assumes a more flexible way of data generation; Aij \ 

i Bernoulli{/(^i,^j)}, 

where / is a function from [0,1]^ —>■ [0,1], called a qr aphon and £iS are ii d random vari¬ 


ables on [0,1]. It is well known (refer, for example, to 


Airoldi et al 


( 2OI3I II that one can 


approximate a sufficiently smooth graphon using elements of 0^. When the smoothness 
of the graphon is unknown, the prior on k should facilitate the posterior to concentrate in 
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the appropriate region. Using such approximation results and modifying our Theorem 14.11 

it may be possible to derive posterior convergence rates for estimating a graphon. 
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